import SetupEnv from './common/setup-env.mdx';

# Automate with scripts in YAML

In most cases, developers write automation scripts just to perform some smoke tests, like checking for the appearance of certain content or verifying that a key user path is accessible. In such situations, maintaining a large test project is unnecessary.

Midscene offers a way to perform automation using `.yaml` files, which helps you focus on the script itself rather than the testing framework. This allows any team member to write automation scripts without needing to learn any API.

Here is an example. By reading its content, you should be able to understand how it works.

```yaml
web:
  url: https://www.bing.com

tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - sleep: 3000

  - name: Check results
    flow:
      - aiAssert: The results show weather information
```

:::info Sample Project

You can find a sample project that uses YAML scripts for automation here:

- [Web](https://github.com/web-infra-dev/midscene-example/tree/main/yaml-scripts-demo)
- [Android](https://github.com/web-infra-dev/midscene-example/tree/main/android/yaml-scripts-demo)

:::

<SetupEnv />

Alternatively, you can use a `.env` file in the same directory where you run the command to store your configuration. The Midscene command-line tool will automatically load it when running a YAML script.

```ini filename=.env
OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

## Install the command-line tool

Install `@midscene/cli` globally:

```bash
npm i -g @midscene/cli
# Or install it within your project
npm i @midscene/cli --save-dev
```

Create a file named `bing-search.yaml` to drive a web browser automation task:

```yaml
web:
  url: https://www.bing.com

tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information
```

Or, to drive an Android device automation task (requires the device to be connected via adb):

```yaml
android:
  # launch: https://www.bing.com
  deviceId: s4ey59

tasks:
  - name: Search for weather
    flow:
      - ai: Open the browser and navigate to bing.com
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information
```

Or, to drive an iOS device automation task (requires WebDriverAgent configuration):

```yaml
ios:
  # launch: com.apple.mobilesafari
  wdaPort: 8100

tasks:
  - name: Search for weather
    flow:
      - ai: Open the browser and navigate to bing.com
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information
```

Run the script:

```bash
midscene ./bing-search.yaml
# Or if you installed Midscene in your project
npx midscene ./bing-search.yaml
```

You will see the script's execution progress and the visual report file.

## Script file structure

Script files use YAML format to describe automation tasks. It defines the target to be manipulated (like a webpage or an Android app) and the series of steps to perform.

A standard `.yaml` script file includes a `web`, `android`, or `ios` section to configure the environment, and a `tasks` section to define the automation tasks.

```yaml
web:
  url: https://www.bing.com

# The tasks section defines the series of steps to be executed
tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information
```

### The `web` part

```yaml
web:
  # The URL to visit, required. If `serve` is provided, provide the relative path.
  url: <url>

  # Serve a local path as a static server, optional.
  serve: <root-directory>

  # The browser user agent, optional.
  userAgent: <ua>

  # The browser viewport width, optional, defaults to 1280.
  viewportWidth: <width>

  # The browser viewport height, optional, defaults to 960.
  viewportHeight: <height>

  # The browser's device pixel ratio, optional, defaults to 1.
  deviceScaleFactor: <scale>

  # Path to a JSON format browser cookie file, optional.
  cookie: <path-to-cookie-file>

  # The strategy for waiting for network idle, optional.
  waitForNetworkIdle:
    # The timeout in milliseconds, optional, defaults to 2000ms.
    timeout: <ms>
    # Whether to continue on timeout, optional, defaults to true.
    continueOnNetworkIdleError: <boolean>

  # The path to the JSON file for outputting aiQuery/aiAssert results, optional.
  output: <path-to-output-file>

  # Whether to save log content to a JSON file, optional, defaults to `false`. If true, saves to `unstableLogContent.json`. If a string, saves to the specified path. The log content structure may change in the future.
  unstableLogContent: <boolean | path-to-unstable-log-file>

  # Whether to restrict page navigation to the current tab, optional, defaults to true.
  forceSameTabNavigation: <boolean>

  # The bridge mode, optional, defaults to false. Can be 'newTabWithUrl' or 'currentTab'. See below for more details.
  bridgeMode: false | 'newTabWithUrl' | 'currentTab'

  # Whether to close newly created tabs when the bridge disconnects, optional, defaults to false.
  closeNewTabsAfterDisconnect: <boolean>

  # Whether to ignore HTTPS certificate errors, optional, defaults to false.
  acceptInsecureCerts: <boolean>
```

### Global agent configuration

If you need to use the `aiActionContext` parameter, you can set it through the global `agent` configuration:

```yaml
# Global AI agent configuration
agent:
  # Background knowledge to send to the AI model when calling aiAction, optional.
  aiActionContext: <string>
```

:::tip aiActionContext configuration

- **Applicable environments**: Web, iOS, and Android environments can all use the global `agent` configuration to set `aiActionContext`
- **Purpose**: Provides background knowledge to the AI model, like how to handle popups, business introduction, etc.

:::

#### Usage example

```yaml
# Global agent configuration, applies to all environments
agent:
  aiActionContext: "If any popup appears, click agree. If login page appears, skip it."

# iOS environment configuration
ios:
  launch: https://www.bing.com
  wdaPort: 8100

# Or Android environment configuration
android:
  deviceId: s4ey59
  launch: https://www.bing.com

tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - aiAssert: The results show weather information
```

### The `android` part

```yaml
android:
  # The device ID, optional, defaults to the first connected device.
  deviceId: <device-id>

  # The launch URL, optional, defaults to the device's current page.
  launch: <url>

  # The path to the JSON file for outputting aiQuery/aiAssert results, optional.
  output: <path-to-output-file>
```

### The `ios` part

```yaml
ios:
  # WebDriverAgent port, optional, defaults to 8100.
  wdaPort: <port>

  # WebDriverAgent host address, optional, defaults to localhost.
  wdaHost: <host>

  # Whether to auto dismiss keyboard, optional, defaults to false.
  autoDismissKeyboard: <boolean>

  # Launch URL or app bundle ID, optional, defaults to the device's current page.
  launch: <url-or-bundle-id>

  # The path to the JSON file for outputting aiQuery/aiAssert results, optional.
  output: <path-to-output-file>

  # Whether to save log content to a JSON file, optional, defaults to `false`. If true, saves to `unstableLogContent.json`. If a string, saves to the specified path. The log content structure may change in the future.
  unstableLogContent: <boolean | path-to-unstable-log-file>
```

### The `tasks` part

The `tasks` part is an array that defines the steps of the script. Remember to add a `-` before each step to indicate it's an array item.

The interfaces in the `flow` section are almost identical to the [API](./api.html), with some differences in parameter nesting levels.

```yaml
tasks:
  - name: <name>
    continueOnError: <boolean> # Optional, whether to continue to the next task on error, defaults to false.
    flow:
      # Auto Planning (.ai)
      # ----------------

      # Perform an interaction. `ai` is a shorthand for `aiAction`.
      - ai: <prompt>
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # This usage is the same as `ai`.
      - aiAction: <prompt>
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Instant Action (.aiTap, .aiHover, .aiInput, .aiKeyboardPress, .aiScroll)
      # ----------------

      # Tap an element described by a prompt.
      - aiTap: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Hover over an element described by a prompt.
      - aiHover: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Input text into an element described by a prompt.
      - aiInput: <final text content of the input>
        locate: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Press a key (e.g., Enter, Tab, Escape) on an element described by a prompt.
      - aiKeyboardPress: <key>
        locate: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Scroll globally or on an element described by a prompt.
      - aiScroll:
        direction: 'up' # or 'down' | 'left' | 'right'
        scrollType: 'once' # or 'untilTop' | 'untilBottom' | 'untilLeft' | 'untilRight'
        distance: <number> # Optional, the scroll distance in pixels.
        locate: <prompt> # Optional, the element to scroll on.
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Log the current screenshot with a description in the report file.
      - logScreenshot: <title> # Optional, the title of the screenshot. If not provided, the title will be 'untitled'.
        content: <content> # Optional, the description of the screenshot.

      # Data Extraction
      # ----------------

      # Perform a query that returns a JSON object.
      - aiQuery: <prompt> # Remember to describe the format of the result in the prompt.
        name: <name> # The key for the query result in the JSON output.

      # More APIs
      # ----------------

      # Wait for a condition to be met, with a timeout (in ms, optional, defaults to 30000).
      - aiWaitFor: <prompt>
        timeout: <ms>

      # Perform an assertion.
      - aiAssert: <prompt>
        errorMessage: <error-message> # Optional, the error message to print if the assertion fails.
        name: <name> # Optional, give the assertion a name, which will be used as a key in the JSON output.

      # Wait for a specified amount of time.
      - sleep: <ms>

      # Execute a piece of JavaScript code in the web page context.
      - javascript: <javascript>
        name: <name> # Optional, assign a name to the return value, which will be used as a key in the JSON output.

  - name: <name>
    flow:
      # ...
```

#### Prompting with images

For steps whose prompt accepts images, you can attach images to the prompt by setting the `images` field to an array of objects, each containing a `name` and a `url`. (see the [API reference](./api.html#prompting-with-images)), replace the string value with an object that contains:

- `prompt`: The text prompt.
- `images`: (Optional) The reference images used in the prompt. Each image needs a `name` and a `url`.
- `convertHttpImage2Base64`: (Optional) Converts HTTP image links to Base64 before sending them to the model, which is useful when the link is not publicly accessible.

Image URLs can be local paths, Base64 strings, or remote links. When using image links that cannot be accessed for the model, set `convertHttpImage2Base64: true` so Midscene will download the image and send the base64 string to the model.

For interactions like `aiTap`, `aiHover`, `aiDoubleClick`, `aiRightClick`, put the text and images in the `locate` field.

```yaml
tasks:
  - name: Verify branding
    flow:
      - aiHover:
          locate:
            prompt: Move the cursor to the region containing the GitHub logo.
            images:
              - name: GitHub logo
                url: https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png
            convertHttpImage2Base64: true

      - aiTap:
          locate:
            prompt: Tap the region containing the GitHub logo.
            images:
              - name: GitHub logo
                url: https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png
            convertHttpImage2Base64: true
```

For VQA steps like `aiAsk`, `aiQuery`, `aiBoolean`, `aiNumber`, `aiString`, and `aiAssert`, you can set the `prompt` and `images` fields directly.

```yaml
tasks:
  - name: Verify branding
    flow:
      - aiAssert:
          prompt: Check whether the image appears on the page.
          images:
            - name: target logo
              url: https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png
          convertHttpImage2Base64: true
```

## Advanced usage of the command-line tool

`@midscene/cli` provides flexible ways to run your automation scripts.

### Run one or more scripts

You can pass a single `.yaml` script file or use a glob pattern to match multiple `.yaml` files to the `midscene` command. This is a shorthand for the `--files` argument.

```bash
# Run a single script
midscene ./bing-search.yaml

# Use a glob pattern to run all matching scripts
midscene './scripts/**/*.yaml'
```

### Command-line options

The command-line tool provides several options to control the execution behavior of your scripts.

- `--files <file1> <file2> ...`: Specifies a list of script files to execute, which will be run in order. Supports glob patterns, following the syntax supported by [glob](https://www.npmjs.com/package/glob).
- `--concurrent <number>`: Sets the number of concurrent executions. Defaults to `1`.
- `--continue-on-error`: If this flag is set, it will continue to run the remaining scripts even if one fails. Defaults to off.
- `--share-browser-context`: Shares the same browser context (e.g., Cookies and `localStorage`) across all scripts. This is very useful for sequential tests that require a login state. Defaults to off.
- `--summary <filename>`: Specifies the path for the generated JSON format summary report file.
- `--headed`: Runs the script in a browser with a graphical user interface, rather than in headless mode.
- `--keep-window`: Keeps the browser window open after the script execution is complete. This option automatically enables `--headed` mode.
- `--config <filename>`: Specifies a configuration file. Parameters in the config file will be used as default values for the command-line arguments.
- `--web.userAgent <ua>`: Sets the browser UA, which will override the `web.userAgent` parameter in all script files.
- `--web.viewportWidth <width>`: Sets the browser viewport width, which will override the `web.viewportWidth` parameter in all script files.
- `--web.viewportHeight <height>`: Sets the browser viewport height, which will override the `web.viewportHeight` parameter in all script files.
- `--android.deviceId <device-id>`: Sets the Android device ID, which will override the `android.deviceId` parameter in all script files.
- `--ios.wdaPort <port>`: Sets the WebDriverAgent port, which will override the `ios.wdaPort` parameter in all script files.
- `--ios.wdaHost <host>`: Sets the WebDriverAgent host address, which will override the `ios.wdaHost` parameter in all script files.
- `--dotenv-debug`: Sets the debug log for dotenv, disabled by default.
- `--dotenv-override`: Sets whether dotenv overrides global environment variables with the same name, disabled by default.

Examples:

Use the `--files` argument to specify the file order and run in parallel.

```bash
midscene --files ./login.yaml ./buy/*.yaml ./checkout.yaml
```

Run all scripts with a concurrency of 4 and continue on any file error.

```bash
midscene --files './scripts/**/*.yaml' --concurrent 4 --continue-on-error
```

### Writing command-line arguments in a file

You can write a configuration file in YAML format and reference it with `--config`. When invoking the command-line tool, command-line arguments have higher priority than the configuration file.

```yaml
files:
  - './scripts/login.yaml'
  - './scripts/search.yaml'
  - './scripts/**/*.yaml'

concurrent: 4
continueOnError: true
shareBrowserContext: true
```

Usage:

```bash
midscene --config ./config.yaml
```

## More features

### Using environment variables in `.yaml` files

You can use environment variables in your `.yaml` files with the `${variable-name}` syntax.

For example, if you have a `.env` file with the following content:

```ini filename=.env
topic=weather today
```

You can use the environment variable in your `.yaml` file like this:

```yaml
#...
- ai: type ${topic} in input box
#...
```

### Running in headed mode

> `web` scenarios only

'Headed' mode means the browser window will be visible. By default, scripts run in headless mode.

To run in headed mode, you can use the `--headed` option. Additionally, if you want to keep the browser window open after the script finishes, you can use the `--keep-window` option. The `--keep-window` option automatically enables `--headed` mode.

Headed mode consumes more resources, so it is recommended to use it only locally.

```bash
# Run in headed mode
midscene /path/to/yaml --headed

# Run in headed mode and keep the browser window open afterward
midscene /path/to/yaml --keep-window
```

### Using bridge mode

> `web` scenarios only

By using bridge mode, you can leverage YAML scripts to automate your existing desktop browser. This is particularly useful for reusing cookies, plugins, and page states, or for interacting manually with automation scripts.

To use bridge mode, you first need to install the Chrome extension and then use the following configuration in the `target` section:

```diff
web:
  url: https://www.bing.com
+ bridgeMode: newTabWithUrl
```

Please refer to [Bridge Mode via Chrome Extension](./bridge-mode-by-chrome-extension) for more details.

### Running YAML scripts with JavaScript

You can also run a YAML script using JavaScript by calling the [`runYaml`](./api.html#runyaml) method on the Agent. Note that this method will only execute the `tasks` part of the YAML script.

### Analyzing command-line tool results

After execution is complete, the following files will be generated in the output directory:

- The file path specified by the `--summary` option (defaults to `index.json`), containing the execution status and statistics of all files.
- The individual execution results of each YAML file (in JSON format).
- The visual reports for each file (in HTML format).

## Configure dotenv's default behavior

Midscene uses [`dotenv`](https://github.com/motdotla/dotenv) to load environment variables from a `.env` file.

### Disable dotenv's debug logs

By default, Midscene prints debug information from dotenv. If you don't want to see this information, you can use the `--dotenv-debug` option to disable it.

```bash
midscene /path/to/yaml --dotenv-debug=false
```

### Use .env to override global environment variables

By default, `dotenv` will not override global environment variables with the same name found in the `.env` file. If you want to override them, you can use the `--dotenv-override` option.

```bash
midscene /path/to/yaml --dotenv-override=true
```

## FAQ

**How can I get cookies in JSON format from Chrome?**

You can use this [Chrome extension](https://chromewebstore.google.com/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) to export cookies in JSON format.

**How can I open the debug log for dotenv?**

Midscene uses `dotenv` to load environment variables from a `.env` file. You can use the `--dotenv-debug` option to open the debug log for dotenv.

```bash
midscene /path/to/yaml --dotenv-debug=true
```

## More

You might also be interested in [Prompting Tips](./prompting-tips).
